Axiomatic Attribution for Deep Networks

نویسندگان

  • Mukund Sundararajan
  • Ankur Taly
  • Qiqi Yan
چکیده

We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms— Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better. 1. Motivation and Summary of Results We study the problem of attributing the prediction of a deep network to its input features. Definition 1. Formally, suppose we have a function F : R → [0, 1] that represents a deep network, and an input x = (x1, . . . , xn) ∈ R. An attribution of the prediction at input x relative to a baseline input x′ is a vector AF (x, x ′) = (a1, . . . , an) ∈ R where ai is the contribution of xi to the prediction F (x). For instance, in an object recognition network, an attribution method could tell us which pixels of the image were responsible for a certain label being picked (see Figure 2). The attribution problem was previously studied by various papers (Baehrens et al., 2010; Simonyan et al., 2013; Equal contribution Google Inc., Mountain View, USA. Correspondence to: Mukund Sundararajan , Ankur Taly . Proceedings of the 34 th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017. Copyright 2017 by the author(s). Shrikumar et al., 2016; Binder et al., 2016; Springenberg et al., 2014). The intention of these works is to understand the inputoutput behavior of the deep network, which gives us the ability to improve it. Such understandability is critical to all computer programs, including machine learning models. There are also other applications of attribution. They could be used within a product driven by machine learning to provide a rationale for the recommendation. For instance, a deep network that predicts a condition based on imaging could help inform the doctor of the part of the image that resulted in the recommendation. This could help the doctor understand the strengths and weaknesses of a model and compensate for it. We give such an example in Section 6.2. Attributions could also be used by developers in an exploratory sense. For instance, we could use a deep network to extract insights that could be then used in a rulebased system. In Section 6.3, we give such an example. A significant challenge in designing an attribution technique is that they are hard to evaluate empirically. As we discuss in Section 4, it is hard to tease apart errors that stem from the misbehavior of the model versus the misbehavior of the attribution method. To compensate for this shortcoming, we take an axiomatic approach. In Section 2 we identify two axioms that every attribution method must satisfy. Unfortunately most previous methods do not satisfy one of these two axioms. In Section 3, we use the axioms to identify a new method, called integrated gradients. Unlike previously proposed methods, integrated gradients do not need any instrumentation of the network, and can be computed easily using a few calls to the gradient operation, allowing even novice practitioners to easily apply the technique. In Section 6, we demonstrate the ease of applicability over several deep networks, including two images networks, two text processing networks, and a chemistry network. These applications demonstrate the use of our technique in either improving our understanding of the network, performing debugging, performing rule extraction, or aiding an end user in understanding the network’s prediction. Remark 1. Let us briefly examine the need for the baseline in the definition of the attribution problem. A common way for humans to perform attribution relies on counterar X iv :1 70 3. 01 36 5v 2 [ cs .L G ] 1 3 Ju n 20 17 Axiomatic Attribution for Deep Networks factual intuition. When we assign blame to a certain cause we implicitly consider the absence of the cause as a baseline for comparing outcomes. In a deep network, we model the absence using a single baseline input. For most deep networks, a natural baseline exists in the input space where the prediction is neutral. For instance, in object recognition networks, it is the black image. The need for a baseline has also been pointed out by prior work on attribution (Shrikumar et al., 2016; Binder et al., 2016). 2. Two Fundamental Axioms We now discuss two axioms (desirable characteristics) for attribution methods. We find that other feature attribution methods in literature break at least one of the two axioms. These methods include DeepLift (Shrikumar et al., 2016; 2017), Layer-wise relevance propagation (LRP) (Binder et al., 2016), Deconvolutional networks (Zeiler & Fergus, 2014), and Guided back-propagation (Springenberg et al., 2014). As we will see in Section 3, these axioms will also guide the design of our method. Gradients. For linear models, ML practitioners regularly inspect the products of the model coefficients and the feature values in order to debug predictions. Gradients (of the output with respect to the input) is a natural analog of the model coefficients for a deep network, and therefore the product of the gradient and feature values is a reasonable starting point for an attribution method (Baehrens et al., 2010; Simonyan et al., 2013); see the third column of Figure 2 for examples. The problem with gradients is that they break sensitivity, a property that all attribution methods should satisfy. 2.1. Axiom: Sensitivity(a) An attribution method satisfies Sensitivity(a) if for every input and baseline that differ in one feature but have different predictions then the differing feature should be given a non-zero attribution. (Later in the paper, we will have a part (b) to this definition.) Gradients violate Sensitivity(a): For a concrete example, consider a one variable, one ReLU network, f(x) = 1 − ReLU(1−x). Suppose the baseline is x = 0 and the input is x = 2. The function changes from 0 to 1, but because f becomes flat at x = 1, the gradient method gives attribution of 0 to x. Intuitively, gradients break Sensitivity because the prediction function may flatten at the input and thus have zero gradient despite the function value at the input being different from that at the baseline. This phenomenon has been reported in previous work (Shrikumar et al., 2016). Practically, the lack of sensitivity causes gradients to focus on irrelevant features (see the “fireboat” example in Figure 2). Other back-propagation based approaches. A second set of approaches involve back-propagating the final prediction score through each layer of the network down to the individual features. These include DeepLift, Layer-wise relevance propagation (LRP), Deconvolutional networks (DeConvNets), and Guided back-propagation. These methods differ in the specific backpropagation logic for various activation functions (e.g., ReLU, MaxPool, etc.). Unfortunately, Deconvolution networks (DeConvNets), and Guided back-propagation violate Sensitivity(a). This is because these methods back-propogate through a ReLU node only if the ReLU is turned on at the input. This makes the method similar to gradients, in that, the attribution is zero for features with zero gradient at the input despite a non-zero gradient at the baseline. We defer the specific counterexamples to Appendix B. Methods like DeepLift and LRP tackle the Sensitivity issue by employing a baseline, and in some sense try to compute “discrete gradients” instead of (instantaeneous) gradients at the input. (The two methods differ in the specifics of how they compute the discrete gradient). But the idea is that a large, discrete step will avoid flat regions, avoiding a breakage of sensitivity. Unfortunately, these methods violate a different requirement on attribution methods. 2.2. Axiom: Implementation Invariance Two networks are functionally equivalent if their outputs are equal for all inputs, despite having very different implementations. Attribution methods should satisfy Implementation Invariance, i.e., the attributions are always identical for two functionally equivalent networks. To motivate this, notice that attribution can be colloquially defined as assigning the blame (or credit) for the output to the input features. Such a definition does not refer to implementation details. We now discuss intuition for why DeepLift and LRP break Implementation Invariance; a concrete example is provided in Appendix B. First, notice that gradients are invariant to implementation. In fact, the chain-rule for gradients ∂f ∂g = ∂f ∂h · ∂h ∂g is essentially about implementation invariance. To see this, think of g and f as the input and output of a system, and h being some implementation detail of the system. The gradient of output f to input g can be computed either directly by ∂f ∂g , ignoring the intermediate function h (implementation detail), or by invoking the chain rule via h. This is exactly how backpropagation works. Methods like LRP and DeepLift replace gradients with discrete gradients and still use a modified form of backpropagation to compose discrete gradients into attributions. UnAxiomatic Attribution for Deep Networks fortunately, the chain rule does not hold for discrete gradients in general. Formally f(x1)−f(x0) g(x1)−g(x0) 6= f(x1)−f(x0) h(x1)−h(x0) · h(x1)−h(x0) g(x1)−g(x0) , and therefore these methods fail to satisfy implementation invariance. If an attribution method fails to satisfy Implementation Invariance, the attributions are potentially sensitive to unimportant aspects of the models. For instance, if the network architecture has more degrees of freedom than needed to represent a function then there may be two sets of values for the network parameters that lead to the same function. The training procedure can converge at either set of values depending on the initializtion or for other reasons, but the underlying network function would remain the same. It is undesirable that attributions differ for such reasons. 3. Our Method: Integrated Gradients We are now ready to describe our technique. Intuitively, our technique combines the Implementation Invariance of Gradients along with the Sensitivity of techniques like LRP or DeepLift. Formally, suppose we have a function F : R → [0, 1] that represents a deep network. Specifically, let x ∈ R be the input at hand, and x′ ∈ R be the baseline input. For image networks, the baseline could be the black image, while for text models it could be the zero embedding vector. We consider the straightline path (in R) from the baseline x′ to the input x, and compute the gradients at all points along the path. Integrated gradients are obtained by cumulating these gradients. Specifically, integrated gradients are defined as the path intergral of the gradients along the straightline path from the baseline x′ to the input x. The integrated gradient along the i dimension for an input x and baseline x′ is defined as follows. Here, ∂F (x) ∂xi is the gradient of F (x) along the i dimension. IntegratedGradsi(x) ::= (xi−x ′ i)× ∫ 1 α=0 ∂F (x′+α×(x−x′))

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A unified view of gradient-based attribution methods for Deep Neural Networks

Understanding the flow of information in Deep Neural Networks (DNNs) is a challenging problem that has gain increasing attention over the last few years. While several methods have been proposed to explain network predictions, only a few attempts to analyze them from a theoretical perspective have been made in the past. In this work, we analyze various state-of-the-art attribution methods and p...

متن کامل

Detecting Overlapping Communities in Social Networks using Deep Learning

In network analysis, a community is typically considered of as a group of nodes with a great density of edges among themselves and a low density of edges relative to other network parts. Detecting a community structure is important in any network analysis task, especially for revealing patterns between specified nodes. There is a variety of approaches presented in the literature for overlapping...

متن کامل

Wallace: Author Detection via Recurrent Neural Networks

Author detection or author attribution is an important field in NLP that enables us to verify the authorship of papers or novels and allows us to identify anonymous authors. In our approach to this classic problem, we attempt to classify a broad set of literary works by a large number of distinct authors using traditional and deep-learning techniques, including Multinomial Naive Bayes, linear S...

متن کامل

DeepAPT: Nation-State APT Attribution Using End-to-End Deep Neural Networks

In recent years numerous advanced malware, aka advanced persistent threats (APT) are allegedly developed by nation-states. The task of attributing an APT to a specific nation-state is extremely challenging for several reasons. Each nation-state has usually more than a single cyber unit that develops such advanced malware, rendering traditional authorship attribution algorithms useless. Furtherm...

متن کامل

Fair Attribution of Functional Contribution in Artificial and Biological Networks

This letter presents the multi-perturbation Shapley value analysis (MSA), an axiomatic, scalable, and rigorous method for deducing causal function localization from multiple perturbations data. The MSA, based on fundamental concepts from game theory, accurately quantifies the contributions of network elements and their interactions, overcoming several shortcomings of previous function localizat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017